Learning Cross-lingual Representations with Matrix Factorization
نویسندگان
چکیده
We present a matrix factorization model for learning cross-lingual representations. Using sentence-aligned corpora, the proposed model learns distributed representations by factoring the given data into language-dependent factors and one shared factor. Moreover, the model can quickly learn shared representations for more than two languages without undermining the quality of the monolingual components. The model achieves an accuracy of 88% on English to German cross-lingual document classification, and 0.8 Pearson correlation on Spanish-English cross-lingual semantic textual similarity. While the results do not beat state-of-the-art performance in these tasks, we show that the crosslingual models are at least as good as their monolingual counterparts.
منابع مشابه
Learning Cross-lingual Word Embeddings via Matrix Co-factorization
A joint-space model for cross-lingual distributed representations generalizes language-invariant semantic features. In this paper, we present a matrix cofactorization framework for learning cross-lingual word embeddings. We explicitly define monolingual training objectives in the form of matrix decomposition, and induce cross-lingual constraints for simultaneously factorizing monolingual matric...
متن کاملGWU NLP at SemEval-2016 Shared Task 1: Matrix Factorization for Crosslingual STS
We present a matrix factorization model for learning cross-lingual representations for sentences. Using sentence-aligned corpora, the proposed model learns distributed representations by factoring the given data into language-dependent factors and one shared factor. As a result, input sentences from both languages can be mapped into fixed-length vectors and then compared directly using the cosi...
متن کاملA Subspace Learning Framework for Cross-Lingual Sentiment Classification with Partial Parallel Data
Cross-lingual sentiment classification aims to automatically predict sentiment polarity (e.g., positive or negative) of data in a label-scarce target language by exploiting labeled data from a label-rich language. The fundamental challenge of cross-lingual learning stems from a lack of overlap between the feature spaces of the source language data and that of the target language data. To addres...
متن کاملAnnotation Projection-based Representation Learning for Cross-lingual Dependency Parsing
Cross-lingual dependency parsing aims to train a dependency parser for an annotation-scarce target language by exploiting annotated training data from an annotation-rich source language, which is of great importance in the field of natural language processing. In this paper, we propose to address cross-lingual dependency parsing by inducing latent crosslingual data representations via matrix co...
متن کاملLimitations of Cross-Lingual Learning from Image Search
Cross-lingual representation learning is an important step in making NLP scale to all the world’s languages. Recent work on bilingual lexicon induction suggests that it is possible to learn cross-lingual representations of words based on similarities between images associated with these words. However, that work focused on the translation of selected nouns only. In our work, we investigate whet...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2016